Search CORE

9 research outputs found

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Author: Douze Matthijs
Dupoux Emmanuel
Kharitonov Eugene
Mazaré Pierre-Emmanuel
Rivière Morgane
Synnaeve Gabriel
Wolf Lior
Publication venue
Publication date: 02/07/2020
Field of study

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Author: Douze Matthijs
Dupoux Emmanuel
Kharitonov Eugene
Mazaré Pierre-Emmanuel
Rivière Morgane
Synnaeve Gabriel
Wolf Lior
Publication venue: HAL CCSD
Publication date: 13/12/2020
Field of study

International audienceContrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative

INRIA a CCSD electronic archive server

Reference-less Quality Estimation of Text Simplification Systems

Author: Bordes Antoine
Humeau Samuel
Martin Louis
Mazaré Pierre-Emmanuel
Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 01/01/2018
Field of study

International audienceThe evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between the simplified text and its original version. In this paper, we compare multiple approaches to reference-less quality estimation of sentence-level text simplification systems, based on the dataset used for the QATS 2016 shared task. We distinguish three different dimensions: gram-maticality, meaning preservation and simplicity. We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Libri-Light: A Benchmark for ASR with Limited or No Supervision

Author: Collobert Ronan
Dupoux Emmanuel
Fuegen Christian
Joulin Armand
Kahn Jacob
Karadayi Julien
Kharitonov Evgeny
Likhomanenko Tatiana
Liptchinsky Vitaliy
Mazaré Pierre-Emmanuel
Mohamed Abdelrahman
Rivière Morgane
Synnaeve Gabriel
Xu Qiantong
Zheng Weiyi
Publication venue: HAL CCSD
Publication date: 20/12/2019
Field of study

We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art

LIBRI-LIGHT: a benchmark for asr with limited or no supervision

Author: Abdelrahman Mohamed,
Collobert Ronan
Dupoux Emmanuel
Fügen Christian
Joulin Armand
Kahn Jacob
Karadayi Julien
Kharitonov Eugene
Likhomanenko Tatiana
Liptchinsky Vitaliy
Mazaré Pierre-Emmanuel
Rivière Morgane
Synnaeve Gabriel
Xu Qiantong
Zheng Weiyi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/05/2020
Field of study

International audienceWe introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio , which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art. Index Terms-unsupervised and semi-supervised learning , distant supervision, dataset, zero-and low resource ASR

Crossref

INRIA a CCSD electronic archive server

164th Infantry News: March 2005

Author: Dupoux Emmanuel
Joulin Armand
Mazaré Pierre-Emmanuel
Rivière Morgane
Publication venue: UND Scholarly Commons
Publication date: 01/03/2005
Field of study

March 2005 edition of the 164th Infantry News. A total of 16 pages, containing news articles, event notices, photographs, and personal memories from the veterans of the 164th Infantry Regiment.https://commons.und.edu/infantry-documents/1069/thumbnail.jp

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

UND Scholarly Commons (University of North Dakota)